-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: MultiIndex.union dropping duplicates from result #38977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
…ltiindex_union � Conflicts: � pandas/_libs/lib.pyx � pandas/tests/indexes/multi/test_setops.py � pandas/tests/libs/test_lib.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Any performance implications?
Thx, code itself is good, but this depends on #40862, after this is merged we can remove the sort_values in the tests. Will run asvs afterwards to check for performance implications |
wait for #40862 right? |
Yes |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -714,6 +714,7 @@ MultiIndex | |||
- Bug in :meth:`MultiIndex.intersection` duplicating ``NaN`` in result (:issue:`38623`) | |||
- Bug in :meth:`MultiIndex.equals` incorrectly returning ``True`` when :class:`MultiIndex` containing ``NaN`` even when they are differently ordered (:issue:`38439`) | |||
- Bug in :meth:`MultiIndex.intersection` always returning empty when intersecting with :class:`CategoricalIndex` (:issue:`38653`) | |||
- Bug in :meth:`MultiIndex.union` dropping duplicates from result (:issue:`38977`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put this on the original whatsnew note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, will adjust here after the other is merged and ping then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending merge conflicts
…ltiindex_union � Conflicts: � doc/source/whatsnew/v1.3.0.rst � pandas/tests/indexes/test_setops.py
Unfortunately this looks like a pretty high performance penalty
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I think this is okay given that it fixes a bug. cc @jreback if this is okay despite the performance penalty.
where is the perf diff coming from now? |
The Index union for duplicates is not that fast |
ok this is fine if this slowdown only occurs when we have duplicates (which i think is the case) |
I'll check the benchmarks |
Unfortunately the slowdown is for also for non monotonic indexes. We could do it with only dispatching to Index.union in case of duplicates or nans, but this is makes it more complicated unfortunately.
|
@phofl can you merge master |
Done |
thanks @phofl |
This more or less sits on top of #36299. The current base-_union implementation works only for sorted indexes correctly if indexes containing duplicates. Hence I've only added tests for sorted indexes,